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Abstract — We adopt the universal composablity as Eve's dis- 
tinguishability in secret key generation from a common random 
number between two distinct players without communication. 
Under this secrecy criterion, using the Renyi entropy of order 
1 + s for s G [0, 1], we derive a new upper bound of Eve's 
distinguishability under the application of the universal hash 
functions. It is also shown that this bound gives the tight 
exponential rate of decrease in the case of independent and 
identical distributions. The result is applied to the wire-tap 
channel model and to secret key generation (distillation) by public 
discussion. 

Index Terms — sacrifice bits, L\ norm distance, universal com- 
posablity, secret key distillation, universal hash functions, wire- 
tap channel 



I. Introduction 

Random privacy amplification based on the universal 
condition] \\ has been studied by many authors|2|, 0, 0, 
0, QUI . 0. This technique is originally aimed for random 
number extraction ]2j, 0. It can be applied to secret key 
generation (distillation) with public communication||7], 0, 
0, ED, JED, 0, H and wire-tap channelE), lfl3l lfl4l. 
lfT31 . fl6l . ifTTl . which treats the secure communication in the 
presence of an eavesdropper. (For details of its application, 
see e.g. the previous paper |6|.) When random privacy am- 
plification is implemented by a universal hash functions, it 
can yield protocols for the above tasks with a relatively small 
amount of calculation. 

Similar to the study 0, ll30l for random privacy am- 
plification based on the universal condition, the previous 
paper[6] focused only on the mutual information with the 
eavesdropper. However, as the secrecy criterion, many papers 
in cryptography community l22l . 0, 0, adopt the half 
of the L\ norm distance, which is also called the variation 
distance or Eve's distinguishability. Because this criterion is 
closely related to universally composable security 1221 . it is 
called the universal composability and is required to evaluate 
the leaked information based on the L\ norm distance from 
cryptography community viewpoint. 

In this paper, we adopt the L\ norm distance as the secrecy 
criterion, i.e., the universal composability, and evaluate the se- 
crecy for random privacy amplification. In the independent and 
identical distributed case, when the rate of generation random 
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numbers is smaller than the entropy of the original information 
source, it is possible to generate the random variable whose 
L\ norm distance to the uniform random number approaches 
zero asymptotically. However, in the realistic setting, we can 
manipulate only a finite size of random variables. So, the speed 
of this convergence is very important. In the community of 
information theory, in order to discuss the speed, we often 
focus on the the exponential rate of decrease. This rate is 
called the exponent, and is widely discussed among several 
topics in information theory, e.g., channel coding|20|, source 
codinglO, OTl . and mutual information criterion in wire- 
tap channel lTT7l . |6). However, the exponent has not been 
discussed in the community of cryptography as an important 
criterion. The purpose of this paper is establishing a systematic 
evaluating method for exponent for the L\ norm distance in 
secure protocols. 

In Sectionlllll first, we focus on Bennett et al[2|'s evaluation 
for random privacy amplification, which employs the Renyi 
entropy of order 2. This evaluation was also obtained by 
Hastad et al [ 30 1 and is often called leftover hash lemma. 
Using a discussion similar to Renner 0, we derive an 
upper bound for the L\ norm distance under the univeresal2 
condition for hash functions, which is the main theorem of 
this paper (Theorem [TJ. 

Next, we apply this theorem to the i.i.d. setting with a 
given generating rate and a given source distribution. Then, 
we derive a lower bound of the exponent of the average of the 
L\ norm distance between the generated random number and 
the uniform random number when a family of universal hash 
functions is applied. Next, we introduce a stronger condition 
for hash functions, which is called strongly universal. We 
consider the n-independent and identical extension, and show 
that the exponential rate of decrease for this bound is tight 
under a stronger condition by using the type method fl3l . 
which was invented by Csiszar and Korner |[T3l and is one 
of standard methods in information theory. Since our bound 
realizes the optimal exponent, it is thought to be powerful 
even for the finite length setting. However, if our protocol 
generating the random number is allowed to depend on the 
original distribution, there is a possibility to improve the ex- 
ponent while it is known that asymptotic generation cannot be 
improved[26|. In Section ITVl we derive the optimal exponent 
in this setting by using the Cramer's Theorem l27l and the type 
method lfl3l . Comparing these two exponents, we can compare 
the performances between the protocol based on universal 
hash functions and the protocol depending on the information 
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source. 

In Section [V] we consider the case when an eavesdropper 
has a random variable correlated to the random variable of 
the authorized user. In this case, when the authorized user 
applies universal hash functions to his random variable, he 
obtain a secure random variable. When we apply Theorem 
Q] to the security by L\ norm distance in this setting, we 
obtain a tighter evaluation (|59] l than existing evaluation than 
that directly obtained from the previous paper|6| . 

In Section [VI] we focus on wire-tap channel model, whose 
capacity has been calculated by Wyner lfl"2l and Csiszar and 
Korner lfl3l . Csiszar lfl4l showed the strong security, and many 
papers fS), [33), ([34 1 treat this model with mutual information 
criterion. The previous paper ifTTl derived bounds for both 
exponential rates of decrease for the security criterion based 
on the L\ norm distance as well as the mutual information 
between Alice and Eve. It obtained a bound for the exponential 
rate of decrease concerning the L\ security criterion. In this 
paper, we apply d59i l to wire-tap channel model, and obtain 
the evaluation of the exponent of the L\ security criterion. 
In Section IVIII it is shown that the evaluation obtained in 
this paper is better than that by the previous paper IfTTl . In 
a realistic setting, it is natural to restrict our codes to linear 
codes. In Section [Villi using doTt . we provide a security anal- 
ysis for a code constructed by the combination of an arbitrary 
linear code and the privacy amplification by universal hash 
functions. This analysis yields the exponential rate of decrease 
for the L\ security criterion. Overall, since d59l > and d67l > are 
derived from Theorem[T] all of the obtained results concerning 
the wire-tap channel model can be regarded as consequences 
of Theorem Q] 

Further, in Section HXl we obtain the bound for the L\ secu- 
rity criterion in one-way secret key generation. In Appendix 
lAl we prove Theorem [2] mentioned in Subsection IIII-AI In 
Appendix iBl we prove Lemma [6] given in Subsection IIVI 

Relation with the previous paperfi&j 

The main difference from the previous paper |6| is that the 
analysis on this paper is based on the universal composability 
while that on the previous paper J6| is based on the mutual 
information criterion. In the first step, this paper derives an 
evaluation (Theorem [T]) of the equality of the uniform random 
number generation by universal hash functions based on the 
L\ norm criterion. Applying Theorem Q] we treat several 
security problems. Since this paper treats the same security 
problems as the previous paper with the different criterion, 
some of protocols used in this paper were used in the previous 
paper|6|. That is, the coding protocols used in Sections I VII 
[VTITI and [IX] are used in Sections III, V, and VI in 0, 
respectively While these protocols are described in [6|, we 
describe the whole protocols in this paper for the readers' 
convenience. 

For the uniform random number generation, this paper 
gives the tight exponential rate of decrease for the L\ norm 
distance, while the previous paper|6| gives a lower bound of 
the exponential rate of decrease based on Shannon entropy. 
Concerning the secret key generation without communication, 



this paper gives a lower bound of the exponential rate of 
decrease based on the universal composability, while the 
previous paper] 6 1 gives a lower bound of the exponential 
rate of decrease based on the mutual information criterion. 
Applying Pinsker inequality (O, we can derive a lower bound 
of the exponential rate of decrease based on the universal 
composability from the lower bound by (6|. As is shown in 
Lemma [8] in Subsection IV-BI our lower bound is (strictly) 
better than combination of Pinsker inequality and the lower 
bound by [6| (except for special cases). Note that application 
of Pinsker inequality © or (O yields the half of the lower 
bound of the exponent of the mutual information as a lower 
bound of the exponent of universal composability. Indeed, we 
give a numerical example at Fig. [2] in which, our bound is 
strictly better than that by J6j. 

Concerning wire-tap channel in a general framework, the 
code given in this paper is quite similar to that in the previous 
paper[6|. However, the evaluation method in this paper is dif- 
ferent with that of the previous paper[6] because the analysis 
in this paper is based on the universal composability while that 
in the previous paper[|6| is based on the mutual information. 
In this model, we can derive a lower bound for the exponential 
rate of decrease based on the universal composability by the 
combination of Pinsker inequality © and the result in @ . As 
is shown in Section IVIII our lower bound is better than this 
lower bound by [6 |. Section [Villi treats a more realistic setting 
by using linear codes. Even in this setting, as is explained in 
Remark Q] our lower bound is (strictly) better than the lower 
bound by [6 1 (except for special cases mentioned in Lemma[8]l. 
The same observation can be applied to secret key generation 
by public communication, which is discussed in Section [IX] 

II. Preliminaries 

First, we briefly explain several notations and basic knowl- 
edge in information theory. In order to evaluate the difference 
two distributions P x and P , we employ the following 
quantities: the L\ distance (variational distance) 

d 1 (P x ,P x ):=^2\P x {x)-P x (x)\, (1) 

X 

the L2 distance 



d 2 (P x ,P x ):= IJ2(pX(x)-P*(x))\ (2) 

and the KL-divergence 

D{P X \\P X ) :=^P x {x){\ogP x {x)-\ogP x {x)). (3) 

X 

These definitions can be extended when the total measure is 
less than 1 i.e., ^ Q P A (a) < 1. In the following, we call such 
P A a sub-distribution. This extension for sub-distributions is 
crucial for the later discussion. 

When a joint distribution P X ' Y is given, we have the 
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following equation 

di(P x,Y ,P X xP Y )=Y: \P X - Y {x,y) - P x (x)P Y (y)\ 



y x 

:J2P Y (v)dl(P XlY=V ,P X ). 



(4) 



When P x , P x are normalized distributions, as a relation 
between the KL-divergence and the L\ distance, the Pinsker 
inequality 



d 1 (P x ,P x ) 2 < D(P X \\P X ) 
is known |fl9l . That is, 



(5) 



-\ogd 1 (P A ,P A )>—\ogD(P*\\P*). (6) 
These relations will be helpful for the latter discussions. 



III. Uniform random number generation 
A. Protocol based on universal hash function: Direct part 

Firstly, we consider the uniform random number generation 
problem from a biased random number a E A, which obeys a 
probability distribution P A when its cardinality \A\ is finite. 
There are two types of protocols for this problem. One is 
a protocol specialized for the given distribution P A . The 
other is a universal protocol that does not depends on the 
given distribution P A . The aim of this section is evaluate 
the performance of the latter setting. In the latter setting, our 
protocol is given by a function / from A to M. = {1, . . . , M}. 

The quality of the random number obeying the sub- 
distribution P A is evaluated by 



dx(P^) ~d 1 (P A ,P A (A)P 1 



> A ) 

mix / J 



(7) 



where P A ix is the uniform distribution on A. We also use the 
Renyi entropy order 1 + s: 

H 1+S (A\P A ) := — log VP» 1+S . 
s ' 

a 

The L2 distance is written by using the Renyi entropy order 
2 as follows. 

d 2 (P A ,P A (A)P A ^ = e- H ^ pA ) £W (8) 

Now, we focus on an ensemble of the functions Jx from 
A to A4 = {1, . . . , M}, where X denotes a random variable 
describing the stochastic behavior of the function /x- In this 
case, we adopt on the following quantity as a criterion of the 
secrecy: 



E x di(P /x(A) ) = Vxd 1 (P^ A \P A (A)P^ A) ) 
=d 1 (P B *,P A (A)P£ ix xP*), 



(9) 



where B is the random variable /x (^4) and the final equation 
follows from Hence, when the expectation Exrfi(-P^ x ^) 
is sufficiently small, the random variable /x(^4) is almost 
independent of the side information X. Then, the choice 



/x can be communicated between Alice and Bob without 
revealing anything about f(A). 

An ensemble of the hash functions /x is called universal 
when it satisfies the following condition! 1 1: 

Condition 1 (Universal): Vai 7^ Va2 G A, the collision 
probability that /x(ai) = /x(i2) is at most jj. 
We sometimes require the following additional condition: 

Condition 2: For any X, the cardinality of f^{i} does not 
depend on i. 

This condition will be used in Section [TV] 

Indeed, when the cardinality |.4| is a power of a prime power 
q and M is another power of the same prime power q, as is 
shown in Appendix II of the previous paper [6|, the ensemble 
{fx} can be given by the concatenation of Toeplitz matrix and 
the identity (X, J) lfl8l only with log ? |-4| — 1 random variables 
taking values in the finite field ¥ q . That is, the function can 
be obtained by the multiplication of the random matrix (X, /) 
taking values in F g . In this case, Condition|2]can be confirmed 
because the rank of (X, /) is constant. 

Bennett et al||2) essentially showed the following lemma. 

Lemma 1: A family of universal hash functions /x satis- 
fies 

Exe -^(/x(A)|p/x<-)) ^ e -H 2 {A\P A ) + ^M!. (10) 

This was also shown by Hastad et al [30| and is often called 
leftover hash lemma. 

Now, we follow the derivation of Theorem 5.5.1 of Renner 
|5| when one classical random variable is given. The Schwarz- 
inequality implies that 



dl (pfxW,P A {A)pfg A) ) 

'd 2 (PM A ),P A (A)P^ A) ). 



The Jensen inequality yields that 

E xdl (P^ A \P A (A)P^ A) ) 



<VM^fExd 2 (PM A ),P A (A) P^ A) ) . 
Substituting © and dTOb into the above inequality, we obtain 

Ex<ii(f /xlA) )<^e- » . (11) 

Using ( fTTT ). we can show the following theorem as a 
generalization of ( TTTb . 

Theorem 1: A family of universal hash functions /x sat- 
isfies 

E x di(P /x( A >) < 3AfT+7e —s for < Vs < 1. 

(12) 



Substituting s = 1, we obtain 

E x di(P /x(A) ) < 3M*e- 



(13) 



Since the difference between ( fTTT i and ( fT3l is only the coeffi- 
cient, Theorem Q] can be regarded as a kind of generalization 
of Bennett et alJU's result ( flOb . 
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Proof: For any R' > 0, we choose subset £l R i 
{P A (a) > e~ R }, and define the sub-distribution P£, by 

if a £ flw 



Pa (a) 



Since 



and 



P A (a) otherwise. 
d 1 (P A ,P A ) = P A (Sl R ,) 



d l {P A {A)P^ A \P A (A)P 1 



fx.{A)\ 



mix 

■>Ai a\ nA, 



(14) 



--d 1 (0 7 (P A (A)-P^,(A))P^ A) ) 
=(P A (A)-P A (A))d 1 (0,P^ A) ) 
=P A (A)-P&(A)=P A (il R ,), 
the idea of "smoothing" by Renner [5| yields that 

dl (pf*W) = d 1 (P^ A \P A (A)P^ A) ) 
<dt(P^ A \P^ A) ) + d 1 {p£ (A) 1 P A (A)P^ A) ) 
+ d 1 (P A (A)P%} A \P A (A)P^ A) ) 

=2P A (n R ,) + d 1 (p R ? {A) ). 

Taking the expectation concerning X, we obtain 

E x rfi(P /x(A) ) < 2P A (Q W ) + E x di(P^ W ). (15) 
The inequality ( fTTT i yields 

EMP^^KMh-i"^^. 
For < s < 1, we can evaluate e~ H ^ A ^ P R'^ and P A (fl R >) as 
e -H 2 (A\p*) = P A {af < ]T P A (a) 1+s e-^ R ' 

P A (a) 1+s e- ( - l ~ s)R ' = e - sH i+s( A \ pA )-( 1 - s )R' 



p A (n R ,)= J2 p a (*)< E 



s e sR> 



a G O r. 



<^(F A (a)) 1+s e sii ' = e -^i+«(A|P A )+ S H'. 



(16) 



(17) 



Combining ( fTBI l. ( [ToT l. and (flTt . for i? := log M, we obtain 

E x di(P /x(A) ) 

<2e - S Hi + s (A|P A )+ s fi' + e fl+|(- s H 1+8 (^|F A )-(l- s )il') 

s-ET 1+s (jl|P A )+sH 

=3e ~ , 

where we substitute R + sH i+°( A \ p ) j n t jj' g 
Next, we consider the case when our distribution P n is 
given by the n-fold independent and identical distribution of 
P A , i.e, (P A ) n . When the random number generation rate 
limn-^oo i log M n is R, we focus on the exponential rate of 
decrease of Exrfi(f^ x ' l< -^™-'), and consider the supremum. 

When an ensemble {/x,n} of hash functions is a family of 
universal hash functions from A n to {1, . . -M n }, Theorem 
CO yields that 

liminf — logE x rfi(P /x ' n(An) ) 



n—>oo ft 



> 



sH 1+s (A\P A ) - sR 
1+3 



for s <E [0, 1]. Taking the maximum concerning s 6 [0, 1], we 
obtain 



-1 



lim inf — log Ex^i (P* : 



•x,„(^„h 



n— yoo n 



> max 

0<s<l 



sH 1+s (A\P A ) - sR 



(18) 



On the other hand, when we apply the Pinsker inequality 1 19] 
to the upper bound for the mutual information obtained 
by the previous paper (6|, we obtain another bound 

which is smaller than ( fTBl . 



sH 1+s (A\P A )-sR 

maxo< s <i g 



B. Protocol based on universal hash function: Converse part 

In oder to show the tightness of the exponential rate of 
decrease JT8l under the universal condition, we consider the 
following ensemble. 

Condition 3 (Strongly universal): For any a G A, 
Pr{/x(a) = m} — -h. The random variable /x( a ) is 
independent of {fx{a')} a ^ a eA- 

Theorem 2: Under the strongly universal ensemble, and 
any subset 57 C A with |fi| < M satisfies 

E x di(P /x(A) ) > (1 - l -^) 2 P A ({l). (19) 

Its proof is given in Appendix [A] 

In order to derive the inequality opposite to ( fT8l from 
Theorem [2] we employ the type method] 19]. In the type 
method, when an n-trial data a n :— (ax, . . . ,a n ) £ A n is 
given, we focus on the distribution p(a) :— tMs±zBl ; which 
is called the empirical distribution for the data a n . In the 
type method, an empirical distribution is called a type. In 
the following, we denote the set of empirical distributions on 
A with n trials by T n - The cardinality \T n \ is bounded by 
(n + l)!- 4 !- 1 d, which increases polynomially concerning 
the number n. That is, 



lim - log \T n I =0. 

n— >oo n 



(20) 



This property is the key idea in the type method. When T n (Q) 
represents the set of n-trial data whose empirical distribution 
is Q, the cardinality of T n (Q) can be evaluated as fl9l : 



p nH(Q) 

\^ f ^^<\T n (Q)\<[e nH ^\ : 



(21) 



where \x] is the minimum m satisfying m > x, and [x\ is the 
maximum m satisfying m < x. Since any element a £ T„ (Q) 
satisfies 



P A "(a) = e~" (£,(QI|p ' 4)+ff(Q)) , 



we obtain an important formula 



1 -nn(Q\\pA) < pA»( Tn (Q)) < e~ nD ^ pA \ 



(22) 



(23) 



Using the above knowledge, we can show the following 
proposition: 
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Proposition 1: When M„ = \e nR \, any sequence of 
strongly universal ensemble {/x,n} from A n to {1, . . . M n } 
satisfies the equation 

kmsup — logE x di(P /x '" (A,) ) < min D(Q\\P A ), 

n^oc n Q:H(Q)<R 

(24) 

where D(Q\\P A ) is the Kullback-Leibler divergence 
J2 aeA Q(a)(\ogQ(a)- log P A (a)). 

Proof: Choose an arbitrary empirical distribution Q € T n 
satisfying that H(Q) < R. Then, due to OTT i. the cardinality 
|T n (Q)| is less than Le ni? J. We choose the subset Q, n ,Q with 



the cardinality \\e n ] so that it at least contains [ 
elements of T n (Q). Using (|2TT > and (I22t . we obtain 



|T„(Q)| 



>- 



o»H(Q) 



-n{D{Q\\P A )+H(Q)) 



Using Theorem [2] with ft n .Q, we obtain 
E xdl (p/-^))>(l-^l) 



1 



iD{Q\\P A 



[e nR \ ' 2\T n \ 



Since Q is an arbitrary empirical distribution Q E T n satisfy- 
ing that H{Q) < R, 



E x di(P /x < 



— „n n. / 



max 



[e nR \ 2\Tn\ QeT n :H(Q)<R 



-nD(Q\\P A ) 



That is, 



■logE x di(P /; 



X,„(A») 



< min J D(Q||P A ) + -log2|7;| 

QeT n :H(Q)<R n 

2 r±e nR l 
-flog(l-^^). 

Due to the continuity of Q H> P(Q), (<9||P A ) and (fSDb . the 
limit n — > oo yields (l24l . ■ 
When R < H(A\P A ), the equation 

m s { H 1+s{ A\P A )-R)= mm D(Q||pA) (25) 

0<s 1 + S Q:H(Q)<R 

is known as the strong converse exponent in the fixed 
source coding [19], [13), ED,|E] (A21)]. The maximum 

max <. 



s(H 1 + s (A\P A )-R) 
1+s 



is realized at s = sq when R 



R S0 := (1 + So ) fAsH 1+s (A\P 



s P 1+S0 (A|P^ 



Since £R S = (1 + S )^(sP 1+s (A|P A )) < 0, P s is 
monotone decreasing concerning s. 

Thus, when H(A\P A ) > R > Ri (Pi is called the critical 
rate.), 

s(H 1+s (A\P A ) - R) s(H 1+s (A\P A ) - R) 

max = max . 

0<s 1 + s 0<s<l 1 + s 

(26) 



Hence, in this case, due to d 1 8t > . 0241 1. ((25), and (|26l i, we obtain 

lim — logE x di(P /x ' l(A ' l) ) 



s(H 1+s (A\P A ) - R) 4x 
= max v + v 1 i '- = min D O P 

0<s<l 1 + S Q:H(Q)<R 



(27) 



However, when P < Pi, 

s(Pi +s (A|P A ) - P) P 2 (A|P A ) - P 



max 

0<s<l 



< max 

0<s 



1 + s 

a(H 1+a (A\P A ) - R) 
1 + s 



The lower bound in ( TT~8b does not coincide with the upper 
bound in d24l i. 

C. Comparison with evaluation by Holenstein-Renner H29V 

In the above derivation, the key point is evaluating the prob- 
ability P A (Q R/ ), which equals the probability (P A ) n {a e 
A n \{P A ) n (a) > e~ nR '} in the n-i.i.d. setting. In the com- 
munity of cryptography, the n-i.i.d. setting is not regarded 
as an important setting because they are more interested in 
the single-shot setting. In such a setting, they sometimes use 
Holenstein-Renner [29 1 evaluation for P x (f2/j/). They proved 
the following theorem. 

Theorem 3: When < H(A) - R' < log \A\, 



(P A ) n {a G A n \{P A ) n {a) > e- nR '} < 2 S^W. 



(28) 

Further, When > 3 and < H{A) - R' < 1 ° s(l 1 ^ l ~ 1) , 

(P A ) n {a £ A n \{P A ) n {a) > e- nR } > —2 o^i^P . 

When |.4| = 2, the inequality yields the following evaluation. 
When < H(A) - P' < 



^ 24n(H(A)-R') 2 
' (log3) 2 



(P A )"{a e A n \(P A ) n {a) > e- nH } > —2 
for even n. 

Our evaluation dUJi of (P A ) n {a £ A n \(P A ) n (a) > e~ nR '} 
contains the parameter < s < 1. Since this parame- 
ter is arbitrary, it is natural to compare the upper bound 
min < s <i e -n{sH 1+s (x\p x )-sR') given by ([Yt], with that by 

Theorem [3] That is, using (I171 l. we obtain the exponential 
evaluation 



-1 



lim — log(P A )"{a e A n \(P A ) n {a) > e - nR '} 

n—>-oo n 

>maxsH 1+s (A\P A ) - sR' , 

0<s 



while Theorem |3] yields that 

lim — log(P A )"{a € A n \(P A ) n (a) > e - "*'} 

{H(A)-R') 2 
-2(log(|^l|+3)) 2 ° g ■ 

In this case, the upper bound is 121 °iog(|^|^i7)? ^ f° r l-^l ^ 3 
and 24 ^™-^ 2 for|^|=2. ° 
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In fact, the probability P a (Qr/) is the key quantity in the 
method of information spectrum, which is a unified method in 
information theory (32 j. When the method of information spec- 
trum is applied to the i.i.d. source, the probability P A (D,]it) 
is evaluated by applying Cramer Theorem (See 11271 ) to the 
random variable log P A (a). Then, we obtain 

lim Zl \ g(P A ) n {a G A n \{P A ) n (a) > e^'} 

n— >oo n 

= mansH 1+s (A\P A ) - sR' (29) 

for R < H(A). Since s M- sHi +s (X\P x ) is concave, when 
H(A) > R > H' 2 (A\P A ), the maximization (|29) can be 
attained with s G [0, 1], i.e., 

lim — \og(P A ) n {a G A n \(P A ) n (a) > e~ nR '} 

n— <roo n 

= max sH 1+s (A\P A ) - sR' , 

0<s<l 

which implies that our evaluation ( TTTb gives the tight bound 
for exponential rate of decrease for the probability (P A ) n {a G 
A n \(P A ) n (a) > e~ nR '}. In fact, the difference among these 
bounds is numerically given in Fig. Q] Therefore, we can 
conclude that our evaluation ( fT7b is much better than that by 
Holenstein-Renner [29 1 . That is, the combination of Lemma 1 
and ( TPTT i is essential for deriving the tight exponential bound. 

Exponential rate 
0.10 r 




Fig. 1. Evaluation of lim„_Kx, =A log(P A ) n {a G A n \(P A ) n (a) > 
e —nR j ij ne: maxQ< s<1 sHi+ s (A\P A ) — sR' (The present paper), 

Normal line: S^^ A ) , , R „{ ,m log 2 (Lower bound by 1291 ), Dashed line: 

2(log(].A|+3))' ! ° J ' 1 

24 '° g ^ ( 3^~ fl ' )2 (Upper bound by (29)) p = 0.200, h(p) = H{A) = 
0.500, d(sHl + s{A)) \ s =i = 0.305, H(A) - ^ = 0.455. 



IV. Specialized protocol for uniform random 

NUMBER GENERATION 

A. Main result of this section 

Next, we consider a function / from A to {1, . . . , M} spe- 
cialized to a given probability distribution P A . This problem is 
called intrinsic randomness, which was studied with general 
source by Vembu and Verdu ||251 . The previous paper ||25l 
discussed the relation between the second order asymptotic 
rate and the central limit theorem. In the following, for the 
comparison with the exponential rate of decrease for (l25l l. we 



prove the following theorem, which gives the optimal expo- 
nential rate of decrease for a given rate generating uniform 
random number. 

Theorem 4: When d ( sH ^( A \ p )) < ^ we obtain 

lim log min di(P^ An ^) 

n^oo n /^e^tfl.) 

= rnax^s{H 1+s (A\P A ) - R), (30) 

where J- n (R) is the set of functions /„ from A n to 

{l,...,Le" fl J}. 

Combining d27l i and Theorem |H we can compare the 
performances between a random universal protocol and the 
best specialized protocol. So, our exponential rate of decrease 
for the protocol based on universal hash functions is slightly 
smaller than the optimal exponential rate of decrease for 
specialized protocols. 

In order to prove Theorem |4] we will show the following 
two inequalities: 

lim sup log min diiP^^"' 1 ) 

< max s(H 1+s (A\P A ) - R) (31) 

0<s<l 

liminf — log min d 1 (P fn{An) ) 
n-j-oo n f n er n (R) 

> max s(H 1+s (A\P A ) - R). (32) 

0<s<l 

Inequality (l3~TT l is called the converse part and Inequality ( |32| ) 
is called the direct part in information theory community. In 
order to show respective inequalities, we prepare respective 
lemmas (Lemmas [2] and [U with non-asymptotic setting in 
Subsection IIV-BI In Subsection IIV-CI using Lemma [4] and 
the concavity property, we show the converse part d3Tl >. Also, 
using Lemma |2] we show the direct part (fJTJ. In the latter 
derivation, we employ the method of type, which is one of 
standard method in information theory |[T9ll . 

B. Non-asymptotic evaluation 

In order to treat the non-asymptotic case, we introduce the 
notation: 

r , _ J x if x > 
[Xl+ :_ { if x< 0. 

Then, the L\ norm for two normalized distributions P and Q 
can be simplified to 

Y, \ p ( a ) - Q(a)\ - 2^[P(a) - Q(a)] + , (33) 

a a 

which is a useful formula for the following discussion. 

Hence, we obtain the following lemma, which is useful for 
our proof of the direct part d32l . 

Lemma 2: Any probability distribution P A and any func- 
tion / from A to {1, . . . , M} satisfy that 

dl (p/W) > P A {a e A\P A (a) > A}. (34) 

Proof: 
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Any positive numbers a±, . . . , ctk satisfies 

k , k 



[E^-^]^EN-^] + - (35) 

i=l i=l 

When P A (a) > P A (a) - i > i, which implies that 



M ' 

7-1/ 



2[P>)-i]+ = 2(P A (a)--L) 

>P^( a ) _ _L + _L = pAf a \ 

~ W M M W 



(36) 



Thus, we obtain 



]T ip^c/- 1 ^)) - i - 2^[p^(r i (&)) - JL] + 

b 6 

>2^[P^(a)-i] + (37) 

ae.4 

>2 E 

(38) 



ae.A:f"Ka)>£ 

> E pA («)' 

aeA:P A (a)>^ r 

where d37l i and d38l follows from (|35T > and (f36b . Therefore, 
we obtain ( |34Y ■ 
In order to show the converse part, we prepare the following 
lemma. 

Lemma 3: Assume that for two integers M > N, two 
positive number sequences a\, . . . , o>n and (3i, . . . , (3m satisfy 
that Y^i=i a i ^ £i=i ft- Then, there exists a map / from 
{!,..., M} to {!,..., N} such that 



AT 



E][ E/ ft — a *l+ — Nmaxftj 



(39) 



Proof: First, we define /(l) := 1. For j > 1, we define 
/(j) inductively. When Ej'e/-i(/( 3 -i))ft' < a f(j-l)- we 
define /(j) := f(j — 1). Otherwise, we define f(j) := f(j — 
1) + 1. Then, the function satisfies the condition ( |39l . ■ 

Now, we consider the case when our distribution P An is 
given by the rt-fold independent and identical distribution 
of P A , i.e, {P A ) n . Using Lemma |3J we have the following 
lemma, which is useful for our proof of the converse part ( f3TT > . 

Lemma 4: For any probability distribution P A , there exists 
a function /„ from A n to {1, . . . , M n } such that 

dl (p/»(A»)) 

<2(P A ) n {a G A n \(P A ) n (a) > —} 

+ 2 M n e- n(D ^ p ^ +H ^ ■ (P A ) n (T n (Q)) 

QeTJ;[M„] 

+ 2\T n \ max e -n{D{Q\\p A )+H( Q )) (40) 
qst„ 2 [m„] 



where 



1 



%l[M n ] := {Q e r„|r>(Q||P A ) + P(Q) > - logM„} 

n 

7^[M„] := {Q G T n \(P A ) n (T n (Q)) < ±.}. 



Proof: In the first step, we define the function /„. In the 
second step, we show that the function satisfies d40l ). 
we divide T n into three parts: 

f£[M n ] := {Q g 7;| e »(^(0ll^)+^(Q)) < M J 

7?[M„] := {Q e (i2[M n }ynT n \(P A ) n (T n (Q)) > j f } 

f„ 2 [M„] := {Q G (f„°[M„]) c n P„|(P A )"(T„(g)) < 

where (7^[MJ) C is the compliment of T^[M n \. These three 
parts have the following relation with the above two parts: 

7; 1 [Af n ] C r„ X [M n ] , [M n ] C T„ 2 [M n ] 

By using the integer n Q := [ (P i/m^ Q ^ J = 
LM n (P A )™(P„(Q))J, the conditions for 7*[M„] and T*[M n ] 
are written as tiq > 1 and uq < 1, respectively. Note that, 
since uq is a non-negative integer, uq < 1 is equivalent with 

7lQ = 0. 

Due to (|22l), the condition that e ™(D(Q\\P A )+H(Q)) < Mn 
is equivalent with the condition that P A " (a) > -ji- for a G 
T n (Q). Hence, 

(P A )"{a G A n \(P A T(a) > = J] (0"(T„(Q)). 

(41) 



QeT° 



So, 



(P^r{aG^|(P^r(a)>^}+ X! XT 



M n 



< J2 (P A T(T n (Q))+ J2 (P A ) n (Tn(Q))<l. 
Qet°[M n ] Qefi[M„] 

Since 

J- E l^(Q)l = ^l{«e^|(P A )"(a)>^}l 



M n 



QeT°[M n 



<(P A r{aeA n \(P A r(a)>—}, 
we have 

E \ T »(Q)\+ E "q< m «- 

QefJ[M„] Qe7^[M„] 

Therefore, we can choose /„ on ff := 
U Qef I i ) [A/„]uf; i 1 [A/-,] Tln ( ( 3) satisfying the following conditions. 

1) For Q,Q' G ?; [Af„] U t x [M n ], f n (T n (Q)) n 
/4(T(Q')) = 0. 

2) /,'Jt„(Q) is injective for Q G T n °[M„]. 

3) |/;(P„(Q))| - n Q for Q G 7;, 1 [MJ. Further, we choose 

satisfying the additional condition. 

4) Any type Q G T;'^] satisfies that < 
for b G f n {T n {Q)). 

Then, for Q G 7^[M„], we obtain 

P /;(^) (6) < 1 +e -«(uWII^)+HW)) j V6 G f' n (T n (Q)). 

lVl n 

(42) 



x 



From the construction, Hence, Equality ((48) and Lemma [2] imply 

E pfLiAn) (b)>^\m% limsup^log min dx(P^) 

That is, < maxs( J ffi +s (^|P A ) - R). (49) 

' 0<s 

53 P A "(a) < ^-|(/;(^')) c |- (43) Since s h> sPa+^P- 4 ) is concave, 

" when d(sgl +; (A|P)) | s= i < R, the maximum 

Next, we define f n on the whole set by modifying f' n as max < s s(H 1+s (A\P A ) - R) is realized at s £ [0,1], i.e., 

follows. max < s <i s{H 1+s {A\P A ) - R) = max < s s(H 1+s (A\P A ) - 

5) /„ is the same as f' n on il'. R )- Therefore, we obtain the converse part <ED- 

6) Due to g3]l, we can apply Lemma [3] to the case when In order to show the direct part (32j, we will show the 
{1,...,N} = (X(ft')) c , {1, . . . , M} = (n') c , a b = following lemma by employing LemmaEl 



±- for b e (/£(fi')) c and j3 a = P A "(a) for a € (ft') c . Lemma 5: 



Following Lemma [3] we define the map / n |m')c from _j 

(nr to (/^')) c . ™ f V log /„^„%) dl(p } 

Our remaining task is to evaluate the value > max S (P 1+S (A|P A ) - P). (50) 

£jp/»(^)(&)- ^] + . Now, we define - < s <i v 1+sV v ; 

qiq\ !pf n (A n )n\ 1 i In order to show Lemma|5] we prepare the following lemma, 

^— -rf A/P whnsf nrnnf is divpn in Annendi x [r1 



M n whose proof is given in Appendix [ 

°*w™> ' Lemma 6: When d(agl +- (A|p » | 8=1 < R, 
Then, fiTt implies that 

_ A A i min ff(Q)+2D(Q||P)-P 

X] ^ ( P ) n {° G ^"l(P )"(«) > }• (44) Q:H(Q)+D{Q\\P)>R 

Qef°[M„] n = max sH 1+s (A\P) - sR 

For Q G 7^[M„], dHJ) implies = max sfli +a (A|P) - sR. (51) 

< s < 1 



When ^ 1+; (^l^)) | s=1 > p, 



C(Q) < nQe -nD{Q\\P A )-nH{Q) 

< Mn e~ nDmpA} - nH(Q) ■ (P A ) n (T n (Q)). (45) 

Thus, m and <E3 imply Q^Xlim*^ + ^ (QI|P) " fl 

y [P /„(A„) (6) _ J_ ]+ =P 2 (A|P)-P (52) 

6e ^0 M " = max sP 1+s (A|P)-sP. (53) 



<(P A )"{a € -4"|(P A )"(a) > — } Proof of Lemma^ Due to ©, (ED, and the continuity 

+ 53 Mne -nD(Q\\p A )-nH(Q) . (p^)»(p n (Q)) of Q h-> P(Q) and P(Q||P A ), we obtain 

Qer " 1[M " ] lim —log 2|T„| max e ~n{D { Q\\p-) + H {Q )) 

(46) n-s-oo n Qe7^[|e nil J] 

Recall the condition 6). Lemma [3] guarantees that = lim min D(Q\\P A ) + H(Q) 

n->-ooQ e 7^[Le««J] 

^i/^/L/NNci -«fPfoilP A l+.fPcw > min H{Q) + 2D(Q\\P A ) - R 

<l(/n( n )) I max e ( (y " Q:P(Q||P A )>P 

QeT " [M " 1 4 > min P/(Q) + 2D(Q||P A )-P. (54) 

<|TJ max e -n{D{Q\\P A )+H{Q)) _ (4?) " Q:H(Q)+B(Q||pA)> R 

ger„ 2 [M„] 

Combining d46l > and d47| i, we obtain ( |40] >. 



0<s<l 



C. Asymptotic evaluation QeT 1 [L e " H J] 

Next, we proceed to the asymptotic evaluation. First, using 
Cramer's theorem||27l, we obtain 



From d23l . 

K n := 53 [e nR \(P A r(T n (Q)) 



MD(Q\\P A )+H(Q)) 



satisfies that 

: i t ^(pT(.^"l(p")- W > -!,( (48, <*» < T„ <36r3 j,„ / -"< 2D «"^' + '"«-' i >. 
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Due to ® and the continuity of Q ^ H(Q) and D(Q\\P A ), 
lim — \ogK n 

n— »oc Tl 

min H(Q) + 2D(Q\\P A ) - R. (55) 

Q:H(Q)+D(Q\\PA)>R 

As is shown in Lemma [6] RHSs of (|54| | and 
(l55l > equal maxo< s <i sPi+ s (j4|P a ) — si?. Since 
man Q < s sH 1+s (A\P A ) - sR > ma^^ sH 1+s (A\P A ) - 
sR, d48l implies that 

Um ^\og(P A r{a e -4"|(P A )» > 



> max s( J ffi +s (A|P A ) - P). 

0<s<l 



(56) 



Thus, applying (|54]i, ||55}, and d56> to the RHS of (gO), and 
using Lemma [6] we can choose a sequence {/„} such that 



lim inf — log min d\ {P^ 



(An)) 



> max s(H 1+s (A\P A ) - R), 

0<s<l 



(57) 



which implies 



V. Secret key generation without communication 
A. Application of Theorem UJ 

Next, we consider the secure key generation problem from 
a common random number A E A which has been partially 
eavesdropped on by Eve. For this problem, it is assumed that 
Alice and Bob share a common random number A E A, and 
Eve has another random number E E £, which is correlated 
to the random number A. The task is to extract a common 
random number f(A) from the random number A E A, which 
is almost independent of Eve's random number E E £. Here, 
Alice and Bob are only allowed to apply the same function / 
to the common random number A E A. 

Then, when the initial random variables A and E obey the 
distribution P ' E , Eve's distinguishability can be represented 
by the following value: 

dliP f(^\E) :=dl ( P /W,s x pE)i 

where P^^ x P E is the product distribution of both marginal 
distributions P^t and P E , and is the uniform distri- 

bution on {1, . . . , M}. While the half of this value directly 
gives the probaility that Eve can distinguish the Alice's infor- 
mation, we call di(Pf( A >' E \E) Eve's distinguishability in the 
following. This criterion was proposed by [22| and was used 
by 0. Since the half of this quantity d x {pf( A ">> E \E) is closely 
related to the universally composable security, we adopt it as 
the secrecy criterion in this paper. As another criterion, we 
sometimes treat 

d[(P^ A ^ E \E) :=dx{pfW' E ,pfW x 
Since d x (PfW x P E ,P^ X x P E ) = dt(P^ A \ P™ x ) < 

di (pf(A),E^pM^ x pEy we haye 

d[(P^' E \E) < 2d 1 (P f( - A ^ E \E). 

Further, when pf^ is the uniform distribution, the above 
both criteria coincide with each other. 



Next, we consider an ensemble of universal hash functions 
{fx}- Similar to ©, the equation 

E x di(P /x(A) ' £ |P) = di(P B ' B ' x ,P^ ix x P E x P x ) (58) 

holds, where B is the random variable fx(A). Hence, when 
the expectation Exe?i(P^ x ^' E |P) is sufficiently small, the 
random variable fx{A) is almost independent of the random 
variables X and E. So, the above value is suitable even when 
we randomly choose a hash function. 

In order to evaluate the average performance, we define the 
quantity 

<t>{t\A\E\P A > E ) ^log^P^eX^^He)^) 1 -' 
= log^£>^a,e)^)i-'. 

e a 

Note that when Eve's random variable E takes a continu- 
ous value in the set £, the relation ( |59l holds by defining 
cj){t\A\E\P A ' E ) in the following way. 

cf>(t\A\E\P A ' E ) :=log f P E (e)de(Y / P AlE (a\e)^) 1 - t . 

This definition does not depend on the choice of the measure 
on y. 

By using Theorem Q] and putting t = j^, any universal 
hash functions {fx} satisfies the inequality: 

E x di(P /x(j4) ' B |P) < 3M*E e (^P A l s (a|e) 1+s )TT7 

a 

= 3 M t e*W A W pA, *'> (59) 
for < t < \. Therefore, there exists a function / such that 

di(P f W' E \E) < 3AfTT7E e (^P A l £; (a|e) 1+s )^ 

a 

= 2,M t e' t ' WpA ' E) (60) 

Next, we consider the case when our distribution p A ^ E ^ 
is given by the n-fold independent and identical distribution 
of P AE , i.e, (P A ' E ) n . Ahlswede and Csiszar Q showed that 
the optimal generation rate 



G(P 



A,E\ 



sup < lim 

{(/-,M„)} I 



log M n 



lim d^P^^'^lEn) = 



equals the conditional entropy H(A\E). That is, the gen- 



eration rate R 



lim„ 



log M„ 



is smaller than H(A\E). 



The quantity d\{P^ An ^' En \E n ) goes to zero. In order to 
treat the speed of this convergence, we focus on the supre- 
mum of the exponential rate of decrease (exponent) for 
d^pM^'^En) for a given R 



ei(P A >*\R) 

-- sup ( lim — logd 1 (P / "( j4 "^ B "|P„) 

{(/ n ,A/„)} L ™^°° n 

1 



lim — logM„ < r\. 

n— >oo n J 
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Since the relation 4>{t\A n \E n \(P A ' E ) n ) — ruj){t\A\E\P A ' E ) Proof: Then, applying Jensen inequality to the concave 

holds, the inequality d60l ) implies that function x M> x~ , we have 



ei (P A > E \R) > -<p{t\A\E\P A < E )-tR. (61) e - sHl+si tlT^' = (J2P E (e)J2P AlE (a\e) 1+s ) T ^ 

e a 

forte [0, 1/2]. That is, taking the maximum concerning t G > pE , >,y pA\E, \ e \i+s\ ^ = e 4>{^\A\E\p A - B ) 
[0, 1/2], we obtain — ^ ^ >^2^i \ \ ) ) ' 

e a 

e 1 (P A ' E \R) > e ( f,(A\E\P A ' E \R), (62) Thus, the equality condition is that the value 

P A)[E (a\e) 1+s does not depends on the choice e at 
where the support of P E . Hence, we obtain the desired argument. 

et(A\E\P A ' E \R) := max ^(t\A\E\P A ' E ) - tR fa order tQ compare ^ bounds e ^ m pA^ R) J| 

6h(A\E\P A ' E \R), we introduce the following value: 



max - <>[ _^IA\E\P A ' E ) — R. 



_ d{sH 1+a (A\E\P 



A y E\ 



0<s<l 1 + s 



Since i t m pA ' E )\t= ~ ds s=0 - = max t H r (A\E\P A ' E )-tR 

-H(A\E), the right hand sides of (|62]i and (|63j are strictly o<t<i 1 ~ t 

greater than 1 for R < H(A\E). Then? we obtain the following lemma. 

Lemma 8: 

B. Comparison with the previous paper Ml/ e$(A\E\P A ' E \R) > e H (A\E\P A ' E \R) > £ H (A\E\P A ' E \R) 

Next, we show how better our bound is than that by ^ ^ 

the previous paper 0. The previous paper O shows the for R < H(A\E). The equality in the first inequality holds if 

following in Section IIA. There exists a sequence of functions and only if the Renyi entropy H\ +so (A\P A \ E=e ) does not 

/„ : A n -> {1, . . . , [e™ fl J} such that depends on the choice e at the support of P E for s := 

' ^ argmaxo^^! | A\E\P A - E ) - j^R. The equality in 

lim^ — log D(PM A ^ E -\\P^ x P E ~) the second inequality holds if and only if ^a\e\p^)-r = 

> max sH 1+s (A\E\P A ' E ) - si?, max < s <i sEl±A A im±^lz£E , 

°- s - 1 Therefore, our exponent e0(A|£ , |P j4>E |i?) is strictly better 

where we define the function than the exponent e H (A\E\P A ' E \R) by Section IIA] 

except for the case satisfying the following two conditions: (i) 

sH 1+s (A\E\P A - E ) := - log]T P E {e)P A \ E (a\e) 1+s -<I>{\\A\E\P A < E ) -\R = max < s <i -^1^1^) - 

a,e T+s"^ - ® •H r 2(-<4|-f >j4 '' E=e ) does not depends on the choice e 

= - log P A ' E (a, e) 1+s P E (e)- s at the support of P E . 

a,e For example, we consider the following case: A equals £ , 

~ rr. n tt i * t~i* i ,. pi the set A has the module structure, (i.e., A is an Abelian 

for s G 0,1. Hence, applying Pinsker inequality flpj, we r>4iB/ i m ■ r 

obtain * group) and the conditional distribution _r 1 (a|e) has the form 

P A (a — e). Then, the equality condition for the first inequality 

ei (P A ' E \R)> lim — logdxiP^^lEn) holds. Since 

>e H (A\E\P A > E \R) (63) e ~ ^2^ r W e > > + 



where 

S ff 1+S (A|£|P^) - si? 

' — : max 

0<s<l 



E^) 



e !+ a = e 



g ff (A|P|P A < £ |P):= max and 



max 



tH_±_(A\E\P A < E )-tR e -sH 1+s (A\E\p^) =Y / P E (e)Y / P AlE He) 1+s 

e a 

=E p£ ( e )E p ->- e ) 1+s 



o<t<i 2 — 2t 



with s = j3T. Concerning the comparison of both bounds, we 
prepare the following lemma. ^-^ _ sh 1+s {a\p a ) sh 1+b (a\p a ) 

Lemma 7: The inequality ^ ' ' 

e 

-#1 +S (,4|.E|P^- E ) > cj)(-J—\A\E\P A ' E ) (64) bounds e (A|P|P A ^|i?) and e H (A\E\P A > E \R) can be sim- 



1 + s iT3V 11 ' ~ " v l + s' 



plified to 

holds for a G (0,oo) The equality holds if and only if the g (A | £ ipX,B|m = eH{A \E\P A > E \R) = e H {A\P A \R) 
Renyi entropy Hi +S (A\P ' ) does not depends on the 



choice e at the support of P . 



e ~e H {A\E\P A > E \R) = ~e H {A\P A \R) 
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where 



e H (A\P A \R) := max 

0<s<l 



sH 1+s {A\P A ) - sR 



1 



= max tH_^(A\P A ) -tR 

0<t<l/2 

- ^Ip^ sH 1+s (A\P A ) - sR 
e H (A\P |i?):-max 

tH^{A\P A ) -tR 



max ■ 

0<t<l/2 



2-2t 



In particular, the both exponents are numerically plotted in 
Fig. E] when A= {0,1}, and P A (0) = p, P A {1) = 1 -p. 

Proof: The first inequality and its equality condition 
follow from Lemma [7] and the definitions of e ( j > (P A ' E \R) 
and eH{P A ' E \R)- The second inequality follows from the 
inequality \ < for s e [0,1]. Since the equality holds 
only when s = 1, we obtain the equality condition for the 
second inequality. ■ 



Exponential rate 
0.10 r 




Fig. 2. Lower bounds of ei(P AJ5 |i?). Thick line: e H (A\P A \R) 
(The present paper), Normal line: e.H{A\P A \R) by (6)), Dashed line: 
H2 ( A \ P ^ — 1 — — (direct application of (TTJ without smoothing) p = 0.200, 
h(p) = H(A) = 0.500, 2 d(sHl + AA)) \ s=1 - H 2 (A) = 0.224. 



VI. The wire-tap channel in a general framework 

Next, we consider the wire-tap channel model, in which 
the eavesdropper (wire-tapper) Eve and the authorized receiver 
Bob receive the information from the authorized sender Alice. 
In this case, in order for Eve to have less information, Alice 
chooses a suitable encoding. This problem is formulated as 
follows. Let y and Z be the probability spaces of Bob 
and Eve, and X be the set of alphabets sent by Alice. 
Then, the main channel from Alice to Bob is described by 
W B : x i — y W E , and the wire-tapper channel from Alice 
to Eve is described by W E : x i-> W E . That is, W x is 
the output distribution on the Bob's side with Alice's input 
x, and W E is the output distribution on the Eve's side with 
Alice's input x. In this setting, in order to send the secret 
message in {1,...,M} subject to the uniform distribution, 
Alice chooses M distributions Qi,---,Qm on X, and she 
generates x £ X subject to Q t when she wants to send the 
message i E {1, ...,M}. Bob prepares M disjoint subsets 
T>\, . . . ,T>m of y and judges that a message is i if y belongs to 



V t . Therefore, the triplet (M, {Qi, . . . , Qm}, {T> u ..., V m }) 
is called a code, and is described by $. Its performance is 
given by the following three quantities. The first is the size 
M, which is denoted by |$|. The second is the average error 
probability es($): 

M 

and the third is Eve's distinguishability di(Q\E): 
di($\E) :=dx{Wi x P^ ix ,W E m) 



The quantity di(§\E) gives an upper bound for the proba- 
bility that Eve can succeed in distinguishing whether Alice's 
information belongs to a given subset. So, the value can be 
regarded as Eve's distinguishability. In order to calculate these 
values, we introduce the following quantity. 

<f>{t\W,p) := log^ (^pix^W^y^-A . 

V \ x J 

When the random variable Y takes a continuous value in the 
set y while X takes discrete value, the above definition can 
be changed to 



<P(t\W,p) r- 




"£p(x)(W x (y)) 1 



/(i-*) 



dy 



This definition does not depend on the choice of the measure 
on y. That is, when W x (y)f(y) — W x (y) for a positive 
function /, 

<Kt\W,p) = log^ (^pixXW^y)) 1 ^ 1 -^ f(y)dy. 

As is shown as Lemma 1 of |6|. (j>(t\W,p) satisfies the 
following lemma. 

Lemma 9: The function p i-> e^^ w ^ is convex for t 6 
[—1, 0], and is concave for t € [0, 1]. 

Now, using the function <fr(t), we make a code for the 
wire-tap channel based on the random coding method. For 
this purpose, we make a protocol to share a random number. 
First, we generate the random code $(Y) with size LM, 
which is described as $(Y)(a) = Y a for a = 1,...,LM 
by using the LM independent and identical random variables 
Y = (Yi, . . . , Yml) subject to the distribution p on X. Gal- 
lager 1 20 1 showed that the ensemble expectation of the average 
error probability concerning decoding the input message A 
is less than {MLfe^-^ 3 for < t < 1 when Bob 
applies the maximum likelihood decoder 2?'(Y) of the code 
$(Y). After sending the random variable A taking values in 
the set with the cardinality ML, Alice and Bob apply the 
above universal hash functions /x to the random variable 
A and generate another piece of data of size M. Here, we 
assume that the ensemble {fx} satisfies Condition |2] Then, 
Alice and Bob share the random variable fx{A) with size M. 
This protocol is denoted by $(X, Y)' 
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Let E be the random variable of the output of Eve's 
channel W E . When p is the uniform distribution on the set 
C := {1, . . . , ML} and the joint distribution P C < E is given by 
P C ' E (c, e) := p{c)W E (e), the equations 



,<t>{t\w li ,p) 



1 



W]>>(c)(W^ (e))^ 



M t L t 

hold. 

For a given code $(Y), we apply the inequality 
Eve's distinguishability. Then, 

0(t|W E ,p miXi4 , (Y) ) 

E X | Y di($(X, Y)'|£) < 3 



(66) 



to 



(67) 



for < V* < \. The concavity of e^ w ^ (Lemma gjl 



guarantees that 



E x ,y<M$(X, Y)'\E) < 3E Y - 

p <P(t\W E ,p) 



(t\w E , Pa 



<3- 



V 

for < Vi < §. 

Now, we make a code for wire-tap channel by modifying 
the above protocol $(X, Y)'. First, we choose the distribution 
Qi to be the uniform distribution on /^{i}. When Alice 
wants to send the secret message i, before sending the random 
variable A, Alice generates the random number A subject 
to the distribution Qi. Alice sends the random variable A. 
Bob recovers the random variable A by using the maximum 
likelihood decoder 2?'(Y), and applies the function fx- Then, 
Bob decodes Alice's message i, and this code for wire-tap 
channel W B , W E is denoted by <I>(X, Y). Since the ensemble 
{/x} satisfies Condi tion|2] and the secret message i obeys the 
uniform distribution on {1, ...,M}, this protocol $(X,Y) 
has the same performance as the above protocol $(X, Y)'. 

Finally, we consider what code is derived from the above 
random coding discussion. Using the Markov inequality, we 
obtain 

Px.y{£b($(X,Y)) < 3E X ,Y£ B ($(X,Y))} > \ 



Px,y{<M$(X,Y)|£;) < 3E x , Y di($(X, Y)\E)} > 



<fi(t\W Bn ,p) = n<f>(t\W B ,p) holds. Thus, there exists a code 
$„ for any integers L n ,M n , and any probability distribution 
p on X such that |$„| = M„ and 

e B ($) <3 min (M n L n )* e n ^ wB , 



di($ n \E) <9 min 

0<t<i 



a n<t>(t\W E ,p) 

It ' 



Since limt_»o ^ . — — — I(p : W E ), the rate max p /(p : 
W B ) — I(p : W E ) can be asymptotically attained. Therefore, 
when the sacrifice information rate is R, i.e., L n = e nR , 
the exponential rate of decrease for Eve's distinguishability 
is greater than 

eJR\W E ,p) := max tR - <j>(t\W E ,p). 

Vy ' 0<t<l/2 

VII. Comparison with existing bound 
In Subsection IVII-AI we compare our exponent 
e<f,(R\W E ,p) with those derived by H7], @] in the 
general setting. In Subsections IVII-BI and IVII-CI using 
discussion in Subsection IV-BI we treat this comparison in 
special cases more deeply. 

A. General case 

Now, we compare the lower obtained bound e c j i (R\W E ,p) 
for the exponential rate of decrease for Eve's distinguishability 
with existing lower bounds ifTTl . ll6l . Using the quantity 

mW, P ) :=logJ2 fep(z)(^*(2/)) 1+t ) W p (v)-* (68) 
y \ x / 

W p (y) :=^2p(x)W x (y), 

X 

the previous paper [17] derived the following lower bound of 
this exponential rate of decrease: 

sR- ip{s\W E ,p) 



ejj(R\W ,p) := max 

v 0<s<l 



1 



= max tR-(l-t)ib( \W E ,p). 

o<t<i/2 v !vy l-V F ' 

(69) 

The other previous paper [6| also derived the following lower 
bound: 



Therefore, the existence of a good code is guaranteed in the 
following way. That is, we give the concrete performance of 
a code whose existence is shown in the above random coding 
method. 

Theorem 5: There exists a code $ for any integers L, M, 
and any probability distribution p on X such that |$| = M 
and 



e B ($) <3 min (ML) e 

0<t<l 



di($\E) <9 min 



t„<f>(-t\W a , f 
e 4>{t\W m , P ) 



In the n-fold discrete memoryless channels W Bn and 
W E " of the channels W B and W E , the additive equation 



max sR — ib(s\W , p) 

0<s<l 



(70) 



for the exponential rate of decrease for the mutual information. 
By applying a discussion similar to Subsection IV-BI and 
Pinsker inequality (0, the bound d70l i yields the bound 

sR-i)(s\W E ,p) 



<L,(R\W ,p) := max 

v 0<s<l 



(71) 



which is smaller than the lower bound e^(R\W ,p) because 



< 



l + s 



for < s < 1. Hence, in order to show the 



superiority of our bound e^(R\W E ,p), it is sufficient to show 
the superiority over the bound e^(R\W E ,p). 

In the following, we compare the two bounds e c f>(R\W E ,p) 
and e^(R\W E ,p) For this purpose, we treat ' pS> and 
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e {i-t)i>{^r t \w E , P ) f or o < t < I. Reverse Holder inequality 
||28l with the measurable space (X,p) is given as 

"£p{x)\X(x)Y(x)\ 

>fc P (x)\x(x)\&) i+ '(Y,p(*)\n*)\-')-' 

for s > 0. Using this inequality, we obtain 



Exponential rate 
0.10 h 



E 

y 

>-k 
E 



w p ( y y 



l+s 



1 \ 1 + s 

1+S 



1 \ 1 + s 

1 + s 



E^w 



Substituting s = y^j, we obtain 

5>(x)(W a (y))^ 
j L i 

which implies 

(l-t)V(T^|W E ,p) 




^E 



_ X 



w p ( y y 



(t\W E ,p) 



Thus, our bound e r j > (R\W E ,p) for the exponential rate of 
decrease is better than the existing bound e^,(R\W E ,p) ifTTIl . 

Example 1: Assume that X = £ = {0, 1}. We consider the 
following channel. 

Wo(0) = a, Wo(l) = 1 - a, W x (0) = 1 - 9a, Wi(l) = 9a. 
When p(0) = 1/2,^(1) = 1/2, 

7(p,^)= fe (l/2-5 P )- ^ + ^ 
WlftW) =log(^( fll+t + (1 2 ' 9a)1+ ^ - 5p)" 4 

+ ( (9p)^ + (i-,)^ (1/2 + 5prt A 



0(t|p,W)=log ( 



,i/(i-t) + (l_ 9p )i/(i-t) 



+ ( (9 P ) i /( i - t ) + (i- P ) i /( i - t ) )1 _; 



Fig. 3. Lower bounds of exponent. Thick line: e l f > (R\W, p) (The present 
paper), Normal line: e^(R\W,p) 1171 . Dashed line: e^,(R\W,p) 00 a = 
0.0500, I(p, W) = 0.119. 



Then, the three bounds e4>(ii|W,p), e^,(i2|W,p), and 
e^,(R\W,p) with a = 0.05 are numerically compared as in 
Fig. 

S. Additive case 

Next, we consider a more specific case. When X = Z and 
^ is a module and W x (z) — Wq(z — x) = P x (z — a;), the 
channel W is called additive. 

Since 



=|AT|*e 

any additive channel W £ satisfies 

e^(i?|W £ ,p mix ) = e^(i?|T^ £ ,p mix ) 
= max t(R- \og\X\) + tH^(X\P x )) 

0<i<± ' 1 - t 



(72) 



e H (X|P A |log|^|-ii) 



(73) 



and 

e^{R\W E , Pn 



t(R-\og\X\)+tH^(X\P x ) 



max 

0<t<i 



2-2* 



=e H {X\P x \log\X\-R) 



for the uniform distribution p n ux on X. 

Hence, our bound e,p(R\W E ,p m ix) is the same as the 
previous bound e^(R\W ,p mlx ). However, since ^ 
for * € [0, 1/2), our bound e<f,(R\W E ,p m i x ) is strictly better 
than the bound e^(R\W E ,p m i x ) by the other previous paper 
[6 1 when the maximum is attained by t £ [0, 1/2). 

C. General additive case 

We consider a more general case. Eve is assumed to have 
two random variables Z 6 X and Z' E Z'. The first random 
variable Z is the output of an additive channel depending on 
the second variable Z'. That is, the channel W E (z, z') can be 
written as W E (z,z') = P x ^ z ' (z - x,z'), where P x ^ z ' is a 
joint distribution. Hereinafter, this channel model is called a 
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general additive channel. This channel is also called a regular 
channel ll2"Tl . For this channel model, we obtain 



a 4>(s\W E ,P mix ,*) 



2,2' x ' ' 

=E(£t^'(*-^')-) 1 - s 

2,2' 3; 1 1 

=l^E(E^ x W)-) 1 - 



^E(E^W')-) 1 - 

z \ x \8 e 4>{8\x\z'\p x > z ')_ 



(74) 



and 



X s |VK E , Pmix ) 



=E(E^'(^') 1+s )(E^(^'r s 

z,z' X ' ' X ' ' 

=i*r 1 E(E (* - ». -') 1+s )(E (* - *. 

2,2' x X 

=\x\ s - l Y.m pX ' z '^ z ') 1+s ) pZ '^'y a 

z,z f X 

2' 2: 

= | A -| Se -^ 1 + s (X|Z'|P^ z ')_ (?5) 

Then, the equalities 

e4> (R\W E , PmiK ) 



= max t(R- \og\X\) - 4>{t\X\Z'\P x > z ' ) 

=e^(X\Z'\P x > z '\\og\X\-R) 

e^{R\W E , Pmix ) 
= max t(R- log\X\) + tH_^{X\Z'\P X ' Z ' ) 

0<i<i 1-4 

=e H (X\Z'\P x > z '\\og\X\-R) 
^{R\W E , Pmix ) 



(76) 



(77) 



; max 

0<t<i 



t(R - log \X\) + tH_i_(X\Z'\P 



n vx,z'\ 



2-2t 



--e H (X\Z'\P x > z '\\og\X\-R) 



(78) 



C\ C X and the decoder by the authorized receiver is given 
as {T> x } x£ Ci> our code for a wire-tap channel is given as 

®d,c 2 = {\Ci/C 2 l{Q[x\\[x\ec 1 /c 2 A' D [x\}[x\ec 1 /c 2 ) based 
on a submodule C2 of C\ as follows. The encoding Qr x i is 
given as the uniform distribution on the coset [x] := x + C2, 
and the decoding Dr x i is given as the subset *J x i ex+ c 2 T) x i . 
Next, we consider a submodule C2 (X) of Ci with cardinality 
|C2(X)| = L that is labeled by a random variable X. Then, 
the module C2 (X) can be regarded as a random variable. Now, 
we impose the module Ca(X) the following condition. 

Condition 4: Any element x 7^ € C\ is included in 
C2(X) with probability at most jgn- 

Then, using d67l ). we can evaluate the performance of the 
constructed code in the following way. 

Theorem 6: Choose the subcode Ca(X) according to Con- 
dition |4] We construct the code $Ci,c 2 (x) by choosing the 
distribution Qw to be the uniform distribution on [x] for 
[x] e Ci/C 2 (X). Then, we obtain 



E x di($ Cl ,c 2 (x)|£) <3- 



(t|W^,P mlx , Cl ) 1 

— 

(79) 

where P m j Xi s is the uniform distribution on the subset S. 

When the channel W E is additive, i.e., W E (z) = P x (z - 
x), the equation 4>{t\W E , P mix . Cl +x) = 4>{t\W E , P mix , Cl ) 
holds for any x. Thus, the concavity of e^ w ^ (Lemma 13 
implies that 



mW E ,P m ^ Cl ) < <j>{t\W E ,Pn»x,x)- 

Thus, combining j79l , ( f80b , and d72l ), we obtain 

-iff 1 (X|P) 



E x di($ Cl> c 2 (x)|£) <3 



(80) 



(81) 



for < Vt < 5. That is, when i = e R , taking the minimum 



concerning < Vt < |, we obtain 

Exdi($ Clt0a( x)|^) < 3e -«*W*l**l*l-*>. 



(82) 



When the additive noise obeys the 71-fold i.i.d. of P on X r 
and L = e n , we obtain 



-nejf(X|P x | log \X\-R) 



(83) 



Exdi($ Cl ,c 2 (x)|£) <3e" 

Similarly, when the channel W E is general additive, i.e., 
W E (z, z') = P x - z \z-x, z'), combining <|79), ©, and (El, 
we obtain 



hold. 

Hence, the observation in Section IV-BI can be applied to 
the comparison among e$(R\ W E ,p mbl ), e xjl (R\W E ,p mix ), 
and ^(iJIW^^mix). Due to Lemma [8] e^-R] W^p,^) 
is strictly better than e^(R\W E ,p m [ x ) and e^(R\W E ,p m i x ) 
except for the special case mentioned in Lemma [8] 

VIII. Wire-tap channel with linear coding 

In a practical sense, we need to take into account the 
decoding time. For this purpose, we often restrict our codes 
to linear codes. In the following, we consider the case 
where the sender's space X has the structure of a module. 
When an error correcting code is given as a submodule 



Exdi^d.cMx)^) <3 



\ x \t e 4,(t\x\z'\p x - z ') 



(84) 



for < Vt < |. That is, when L = e R \ taking the minimum 
concerning < Vt < |, we obtain 



Exdi(*ci,c 2 (x)l-E) <3e 



- e<l> (X\Z'\P x - z \\og\X\-R) 



In the n-fold i.i.d. case, when L = e nR , we obtain 



-ne 4 ,(X\Z'\P x ' z \ \og\X\-R) 



(85) 



(86) 



E x di($ Cl ,c 2 (x)|£) <3e- 

When X is an n-dimensional vector space F™ over the finite 
field Fo, the bound can be attained by the combination of 
linear code and the concatenation of Toeplitz matrix and the 
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identity (X, I) of the size to x (to — fc)|6). Hence, if the error 
correcting code C\ can be realizable, the whole process in the 
above code can be realizable. 

Remark 1: In the additive case, due to (1731 . the exponent 
of the upper bound given in ([83} is the same as that given by 
the previous paper 1 1 7 j . However, the code given in |fl7l is 
constructed by completely random coding. However, the code 
given in this section is based on the ordinary linear code. For 
the security, it requires only the universal hash condition. So, 
our construction requires smaller complexity than that given 
in lfl7l . In the general additive case, our exponents ( f86l l is 
strictly better than that given in 1171 . which is calculated in 

dn). 

Next, we consider the relation with the other previous paper 
[6 1 in the general additive case. The protocol given in [6 1 is is 
quite similar to ours. However, as is shown in Lemma[8] except 
for the very special case, our exponent d86i > is strictly better 
than that given in |6|, which is calculated in (17 St , Remember 
that the exponent given in J6) is e^(R\ W E 7 p m ix), which is 
mentioned around ( TTTb . 

IX. Secret key generation with public 

COMMUNICATION 

Furthermore, the above result can be applied to secret key 
generation (distillation) with one-way public communication, 
in which, Alice, Bob, and Eve are assumed to have initial 
random variables A £ A, B £ B, and E £ £, respectively. 
The task for Alice and Bob is to share a common random 
variable almost independent of Eve's random variable E by 
using a public communication. For this purpose, we assume 
that Alice and Bob can perform local data processing in the 
both sides and Alice can send messages to Bob via public 
channel. That is, only one-way communication is allowed. We 
call such a combination of these operations a code and denote 
it by $. 

The quality is evaluated by three quantities: the size of the 
final common random variable, the probability that their final 
variables coincide, and Eve's distinguishability di(Q\E) of the 
final joint distribution between Alice and Eve. 

In order to construct a protocol for this task, we assume 
that the set A has a module structure (any finite set can be 
regarded as a cyclic group). Then, the objective of secret 
key distillation can be realized by applying the code of a 
wire-tap channel as follows. First, Alice generates another 
uniform random variable X and sends the random variable 
X 1 := X + A. Then, the distribution of the random variables 
B, X' (E, X') accessible to Bob (Eve) can be regarded as the 
output distribution of the channel x H > W B (x i-» W E ). The 
channels W B and W E are given as follows. 

W x B (x>, b) = P A > B (x> - x, b), W E (x', e) = P A > E (x' - x, e), 

(87) 

where P AB (a, b) (P AE (a, e)) is the joint probability between 
Alice's initial random variable A and Bob's (Eve's) initial 
random variable B (E). Hence, the channel W is general 
additive. 



Applying Theorem to the uniform distribution P A ix , for 
any numbers M and L, due to d74l . there exists a code $ such 
that |$| = M andQ 

cb(*) < 3 miri (ML) s \A\- s e^- slAWpA ' B) (88) 

0<s<l 

I Alt 4,(t\A\E\P A - E ) 

d 1 (<Z>\E)<9 min — . (89) 

o<t<± L l 

In particular, when the joint distribution between A 
and B(E) is the n-fold independent and identical distri- 
bution (i.i.d.) of P A ' B (P A ' E ), respectively, the relation 

(f>(t\A n \E n \(P A ' E ) n ) = n(/)(t\A\E\P A > E ) hold. Thus, there 
exists a code $„ for any integers L n , M n , and any probability 
distribution p on X such that | $„ | = M n and 

e B ($) < 3 min {M n L n ) s \A\- ns e 71 ^ 8 ^ 8 ^'^ (90) 

0<s<l 

\A\nt n<p(t\A\E\P A - E ) 

di($ n \E)<9 min l -J . (9 i) 

Finally, we mention the relation with the previous paper 
||T7l . Since the above discussion is an application of section 
IVIIII the same comparison as Remark Q] is valid. Hence, our 
evaluation d9Tl > is strictly better than that given in IfTTl except 
for the special case. 

X. Discussion 

We have derived the tight evaluation for exponent for 
the average of the L\ norm distance between the gener- 
ated random number and the uniform random number when 
universal hash functions are applied and the key generation 
rate is less than the critical rate R\ . Using this evaluation, we 
have obtained an upper bound for Eve's distinguishability in 
secret key generation from a common random number without 
communication when a universal hash functions are applied. 
Since our bound is based on the Renyi entropy of order 1 + s 
for s £ [0, 1], it can be regarded as an extension of Bennett et 
al 12 's result with the Renyi entropy of order 2. 

Applying this bound to the wire-tap channel, we obtain 
an upper bound for Eve's distinguishability, which yields 
an exponential upper bound. This exponent improves on the 
existing exponent ifTTIl . Further, when the error correction code 
is given by a linear code and when the channel is additive 
or general additive, the privacy amplification is given by a 
concatenation of Toeplitz matrix and the identity matrix. This 
method can be applied to secret key distillation with public 
communication. 
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Appendix A 
Proof of Theorem[2] 

First, for a fixed element a € ft, we introduce the condition 
for a hash funcation /x: 

Condition 5 (Condition [a,Q,]): 

/x(a) ^ Ma') for Va'(^ a) E fl 

Let P[a, f2] be the probability that Condition [a, fi] holds. Due 
to strongly universal condition, it is evaluated by P[a, fi] > 
1 - J§L When we denote the expectation concerning the hash 
funcations under Condition [a, f2] by Ex|[ a ,f2]> me convexity 
of the function x H> |x| yields that 



Ex|[^]|P» + J2 PA ^-Jj 
>\P A (a)+E W] Yl pA(a,) ~Ji' 

a '(^°)G/x 1 ( a ) 



M 



--\P A (a) + — Y P A (a') - - 
1 w M ^ A/ 

-W A (a) + —h-P A (n))-—\ 
I K J-r M \ \ )) M \ 

AP A (a)~^P A {n)\. 



Thus, 



E x rfi(P /x(A) ) 
> ]T P[a, n]Ex| [B , Q] |p» + J] P A (a') - ^, 

>£(i-§)i^w-^(«)i 



=|(i - l §)(P A (n) l §P A (m = (i - l -§fp A (n). 



Appendix B 
Proof of Lemma|6] 

We choose s(R) such that d(sgl +; (A|P)) | s=s(jR) 
H(Pi +s (R)) + D(P 1+S(R) \\P) = R, wliere P 1+S (a) 



When ® satisfies + ^(Qll-P) = R > 
D{Q\\P) - D(P 1+S \\P) 
= ^Q(a)(logQ(a)- log P(a)) 



E 



Ea' P («') 1+S 



log P(a)) 



= ^Q(a)(logQ(a)-log 



P(a) ] 



? (Q(a) -E^0^ } 

Pfal 1+S 



/\l + s ) 



P(a 



Ea^(«') 1+S 

=£>(Q||Pi +s ) + s£(Q(a) - E 'p(a0 1 +' ) l0gP(a) 

=£>(Q||P 1+S ) 

+ s(H(P 1+s ) + D(P 1+S \\P) - H(Q) + D(Q\\P)) 
=D{Q\\P 1+S ) > 0. 

Hence, 

min P(Q) + 2D(Q||P)-P 

0:iT(Q)+D(Q||F)=fl 

min £>(Q||P) = P(P 1+s( m||P) 

=sH 1+s (A\P) - s{R) — \s= s (R) 

=sHi +s (A\P) - s(R)R = maxsH 1+s (A\P) - sR. 

0<s 

The last equation follows from the concavity of sHi +s (A\P) 
concerning s. 



Assume that 
When R' > R, 



d(sH 1 + s (A\P)) | 



, =1 < R. Then, s(R) < 1. 



min H(Q) + 2D(Q\\P)-R 

Q:H(Q)+D{Q\\P)=R> 

= max sH 1+s (A\P) -sR + R' - R 

0<s 

>sH 1+s{R) (A\P) ~ s(R)R' +R'-R 
>sH 1+s(R) (A\P) - s(R)R 
= m&-xsH 1+s (A\P) - sR 

= max sHi +s (A\P) - sR, 

0<s<l 



which implies ( T5Tb . 



Assume that 



d(sH 1 + ,(A\P)) i 



,=i > R. When R' > R, 



min H(Q) + 2D(Q\\P) - R 

Q:H(Q)+D{Q\\P)=R> 

= maxsH 1+s (A\P) -sR + R' -R 

0<s 

>IH 1+1 {A\P) -R' + R' -R = H 2 {A\P) - R. 
Further, when R' = d{sHl+ d s s {AlP)) \ s=1 , 

min H(Q) + 2D(Q\\P) - R 

Q:H{Q)+D{Q\\P)=R> 

=H 1+1 {A\P) -R' + R'-R= H 2 (A\P) - R, 
which implies (I52t . 
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Further, the concavity of s n- sHi +s (A\P) and 
the condition d ( sHl +j>( A \ p » | s=1 > i? i mp l y that 
max < s <isiJi +s (A|P) - si? = i? 2 (^|-P) - Thus, 
we obtain (1531 , 
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